// Licensed to the Apache Software Foundation (ASF) under one or more
// contributor license agreements.  See the NOTICE file distributed with
// this work for additional information regarding copyright ownership.
// The ASF licenses this file to You under the Apache License, Version 2.0
// (the "License"); you may not use this file except in compliance with
// the License.  You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
= Memory and JVM Tuning

This article provides best practices for memory tuning that are relevant for deployments with and without native persistence or an external storage.
Even though Ignite stores data and indexes off the Java heap, Java heap is still used to store objects generated by
queries and operations executed by your applications.
Thus, certain recommendations should be considered for JVM and garbage collection (GC) related optimizations.

[NOTE]
====
[discrete]
Refer to link:perf-and-troubleshooting/persistence-tuning[persistence] tuning article for disk-related
optimization practices.
====

== Tune Swappiness Setting

An operating system starts swapping pages from RAM to disk when overall RAM usage hits a certain threshold.
Swapping can impact Ignite cluster performance.
You can adjust the operating system's setting to prevent this from happening.
For Unix, the best option is to either decrease the `vm.swappiness` parameter to `10`, or set it to `0` if native persistence is enabled:

[source,shell]
----
sysctl -w vm.swappiness=0
----

The value of this setting can prolong GC pauses as well. For instance, if your GC logs show `low user time, high
system time, long GC pause` records, it might be caused by Java heap pages being swapped in and out. To
address this, use the `swappiness` settings above.

== Share RAM with OS and Apps

An individual machine's RAM is shared among the operating system, Ignite, and other applications.
As a general recommendation, if an Ignite cluster is deployed in pure in-memory mode (native
persistence is disabled), then you should not allocate more than 90% of RAM capacity to Ignite nodes.

On the other hand, if native persistence is used, then the OS requires extra RAM for its page cache in order to optimally sync up data to disk.
If the page cache is not disabled, then you should not give more than 70% of the server's RAM to Ignite.

Refer to link:memory-configuration/data-regions[memory configuration] for configuration examples.

In addition to that, because using native persistence might cause high page cache utilization, the `kswapd` daemon might not keep up with page reclamation, which is used by the page cache in the background.
As a result, this can cause high latencies due to direct page reclamation and lead to long GC pauses.

To work around the effects caused by page memory reclamation on Linux, add extra bytes between `wmark_min` and `wmark_low` with `/proc/sys/vm/extra_free_kbytes`:

[source,shell]
----
sysctl -w vm.extra_free_kbytes=1240000
----

Refer to link:https://events.static.linuxfound.org/sites/events/files/lcjp13_moriya.pdf[this resource, window=_blank]
for more insight into the relationship between page cache settings, high latencies, and long GC pauses.

== Java Heap and GC Tuning

Even though Ignite and Ignite keep data in their own off-heap memory regions invisible to Java garbage collectors, Java
Heap is still used for objects generated by your applications workloads.
For instance, whenever you run SQL queries against an Ignite cluster, the queries will access data and indexes stored in
the off-heap memory while the result sets of such queries will be kept in Java Heap until your application reads the result sets.
Thus, depending on the throughput and type of operations, Java Heap can still be utilized heavily and this might require
JVM and GC related tuning for your workloads.

We've included some common recommendations and best practices below.
Feel free to start with them and make further adjustments as necessary, depending on the specifics of your applications.

[NOTE]
====
[discrete]
Refer to link:perf-and-troubleshooting/troubleshooting#debugging-gc-issues[GC debugging techniques] sections for best
practices on GC logs and heap dumps collection.
====

=== Generic GC Settings

Below are sets of example JVM configurations for applications that can utilize Java Heap on server nodes heavily, thus
triggering long — or frequent, short — stop-the-world GC pauses.

For JDK 1.8+ deployments you should use G1 garbage collector.
The settings below are a good starting point if 10GB heap is more than enough for your server nodes:

[source,shell]
----
-server
-Xms10g
-Xmx10g
-XX:+AlwaysPreTouch
-XX:+UseG1GC
-XX:+ScavengeBeforeFullGC
-XX:+DisableExplicitGC
----

If G1 does not work for you, consider using CMS collector and starting with the following settings.
Note that 10GB heap is used as an example and a smaller heap can be enough for your use case:

[source,shell]
----
-server
-Xms10g
-Xmx10g
-XX:+AlwaysPreTouch
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:+CMSClassUnloadingEnabled
-XX:+CMSPermGenSweepingEnabled
-XX:+ScavengeBeforeFullGC
-XX:+CMSScavengeBeforeRemark
-XX:+DisableExplicitGC
----

[NOTE]
====
//TODO: Is this still valid? What does it do?
If you use link:persistence/native-persistence[Ignite native persistence], we recommend that you set the
`MaxDirectMemorySize` JVM parameter to `walSegmentSize * 4`.
With the default WAL settings, this value is equal to 256MB.
====

=== Advanced Memory Tuning

In Linux and Unix environments, it's possible that an application can face long GC pauses or lower performance due to
I/O or memory starvation due to kernel specific settings.
This section provides some guidelines on how to modify kernel settings in order to overcome long GC pauses.

[WARNING]
====
[discrete]
All the shell commands given below were tested on RedHat 7.
They may differ for your Linux distribution.
Before changing the kernel settings, make sure to check the system statistics/logs to confirm that you really have a problem.
Consult your IT department before making changes at the Linux kernel level in production.
====

If GC logs show `low user time, high system time, long GC pause` then most likely memory constraints are triggering swapping or scanning of a free memory space.

* Check and adjust the link:perf-and-troubleshooting/memory-tuning#tune-swappiness-setting[swappiness settings].
* Add `-XX:+AlwaysPreTouch` to JVM settings on startup.
* Disable NUMA zone-reclaim optimization.
+
[source,shell]
----
sysctl -w vm.zone_reclaim_mode=0
----

* Turn off Transparent Huge Pages if RedHat distribution is used.
+
[source,shell]
----
echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled
echo never > /sys/kernel/mm/redhat_transparent_hugepage/defrag
----

=== Advanced I/O Tuning

If GC logs show `low user time, low system time, long GC pause` then GC threads might be spending too much time in the kernel space being blocked by various I/O activities.
For instance, this can be caused by journal commits, gzip, or log roll over procedures.

As a solution, you can try changing the page flushing interval from the default 30 seconds to 5 seconds:

[source,shell]
----
sysctl -w vm.dirty_writeback_centisecs=500
sysctl -w vm.dirty_expire_centisecs=500
----

[NOTE]
====
[discrete]
Refer to the link:perf-and-troubleshooting/persistence-tuning[persistence tuning] section for the optimizations related to disk.
Those optimizations can have a positive impact on GC.
====
